MDEV-38877: Unnecessary filesort on derived table materialization#4722
MDEV-38877: Unnecessary filesort on derived table materialization#4722OmarGamal10 wants to merge 1 commit intoMariaDB:12.3from
Conversation
Fixes unnecessary filesort on derived tables when ordered/grouped by a field in the key. The data is inherently sorted, wrapping the result set in a filesort is redundant.
|
from discussion: From @OmarGamal10 , that @mariadb-RexJohnston I can get you to provide guidance on. "I've verified that the skipped filesorts in the fails are correct, the cases are similar to what I worked on and I verified correctnes[s] by modifying the test locally to check the returned set . The estimation on the other hand of the filters was for some reason correct before my fix, I checked the real percentage of the filtered columns and the old estimate was right, also for this query the plan showed that the materialized table is Lateral derived not just derived, so I figured maybe something related to it being lateral and my fix (not flagging the constant) messed the estimate up. So it's been about 2.5 hours trying to find in code how to know if a table is derived and lateral, to add it to the condition, there are variables for finding it yes, but for some reason none of them gets set at all, I think because syntactically the query has no "Lateral" word, but it's just a subquery, the optimizer probably rewrote it to be lateral, which I can't seem to catch at all |
|
Hi, you are right to point out the involvement of the split_materialized optimizer flag. if we switch it off, our problem vanishes. indeed, when we set the problematic table->const_key_parts, we see in our stack trace, we are calling sort_and_filter_keyuse() from JOIN::add_keyuses_for_splitting(), but split materialization isn't used. test_if_order_by_key() then later uses this field to determine the requirement for a temporary table. It would be reasonable to conclude that const_key_parts is set above for accessing t1 from t2 using split materialization, and that since we did not decide to use this access method, resetting this field back to it's previous value before calling JOIN::add_keyuses_for_splitting(), but after deciding NOT to use split materialization, perhaps in JOIN_TAB::fix_splitting(), would be the safest fix. |
Why it happens?
The optimizer flags outer references for subqueries as constants, so that for every re-execution for the subquery, index is skipped, as filtering on a constant makes all rows have the same value already.
Consider this example
SELECT * FROM t2 JOIN (SELECT groups_20, MAX(b) FROM t1 GROUP BY groups_20) DT ON t2.a = groups_20;After hours of investigation, I found that the index is bypassed because
table->const_key_partsincorrectly flags the[GROUP/ORDER] BYcolumn as a constant. This optimization is correct for Nested Loop / Lateral joins since the subquery is re-executed for each outer row, the join column is a literal constant in this context, making index usage/sorting redundant.However, if the optimizer decides to materialize the subquery, the subquery is executed once to build a table. In this context, the column is a variable, not a constant.
The fix is a guard condition to prevent treating an outer reference as a constant in case of derived tables.
Before